Be able to describe what a risk factor is and how they are used in understanding and managing disease
Be able to calculate prevalence and incidence
Be able to describe the roles of cohort studies
Be able to describe the key sources of bias in cohort studies and how these biases are minimised
Risk factors are factors that predict disease, they are not necessarily causal
Biomarkers measure biological processes and are used in a large number of ways, including to measure exposure, response to treatment and disease prognosis
Restriction is an important method for reducing bias in cohort studies (e.g. restricting to incident (new) users of the intervention under study)
Measure disease/outcome incidence
Identify risk factors for the disease/outcome
Measure the effect of exposure to a risk factor on the incidence of a disease or outcome
is the probability of an event in a defined period of time
is a characteristic that is associated with an increased risk of the disease or outcome
is the fraction of a group of people possessing a clinical condition at any one time
is the fraction of a group of people initially free of disease that develop the condition over a period of time
Cohort study started in 1950 in Framingham, Massachusetts (close to Boston)
The study set out to better understand the causes of cardiovascular disease
The study started with a sample of people living in Framingham (the population was predominately white and middle class)
Participants provided a medical history and had a physical examination, this was repeated at regular intervals
Most participants were free from cardiovascular disease at time of recruitment
As participants start to develop cardiovascular disease it is possible to assess risk factors
Early key results include: cholesterol, blood pressure, physical activity are risk factors for cardiovascular disease
Framingham Heart Study has been including genetic data since 2006
Over time a more diverse population was recruited in addition to offspring of the original participants
Predict disease
May or may not be causal
May or may not be modifiable (e.g. gender and cardiovascular disease)
Can have a long latency period (e.g. the effects of high cholesterol on coronary heart disease takes decades)
Often lead to a small increase in risk
Often combine to increase risk; multiple risk factors for a single disease (e.g. hypertension, smoking and male gender combine to increase risk of coronary heart disease)
Single risk factors can lead to an increase in multiple diseases (e.g. hypertension increases risk of stroke, heart failure and coronary heart disease)
A defined characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention, including therapeutic interventions. Molecular, histologic, radiographic, or physiologic characteristics are types of biomarkers. A biomarker is not an assessment of how an individual feels, functions, or survives.
FDA-NIH Biomarker Working Group (2016)
| Biomarker type | FDA definition | Example |
|---|---|---|
| Susceptibility/risk biomarker | A biomarker that indicates the potential for developing a disease or medical condition in an individual who does not currently have clinically apparent disease or the medical condition. | LDL and cardiovascular disaease |
| Diagnostic biomarker | A biomarker used to detect or confirm presence of a disease or condition of interest or to identify individuals with a subtype of the disease. | Troponin and acute coronary syndrome |
| Monitoring biomarker | A biomarker measured repeatedly for assessing status of a disease or medical condition or for evidence of exposure to (or effect of) a medical product or an environmental agent. | HbA1c and T2DM |
| Prognostic biomarker | A biomarker used to identify likelihood of a clinical event, disease recurrence or progression in patients who have the disease or medical condition of interest. | Hypertension and secondary cardiovascular risk |
| Biomarker type | FDA definition | Example |
|---|---|---|
| Predictive biomarker | A biomarker used to identify individuals who are more likely than similar individuals without the biomarker to experience a favorable or unfavorable effect from exposure to a medical product or an environmental agent. | Thiopurine methyltransferase genotype prior to treatment with 6-mercaptopurine |
| Response biomarker | A biomarker used to show that a biological response, potentially beneficial or harmful, has occurred in an individual who has been exposed to a medical product or an environmental agent; includes pharmacodynamic biomarkers and surrogate endpoint biomarkers. | HbA1c, blood pressure, LDL in cardiovascular disease |
| Safety biomarker | A biomarker measured before or after an exposure to a medical product or an environmental agent to indicate the likelihood, presence, or extent of toxicity as an adverse effect. | Serum potassium levels in patients taking ACE inhibitors |
an endpoint supported by clear mechanistic rationale and clinical data providing strong evidence that an effect on the surrogate endpoint predicts specific clinical benefit
an endpoint supported by strong mechanistic and/or epidemiologic rationale such that an effect on the surrogate endpoint is expected to be correlated with an endpoint intended to assess clinical benefit in clinical trials, but without sufficient clinical data to show that it is a validated surrogate endpoint.
an endpoint that is still under evaluation for its ability to predict clinical benefit
does the method for measuring the biomarker accurately measure the biomarker?
does the biomarker accurately measure or predict the clinical endpoint/outcome?
does use of the biomarker to inform care improve clinical outcomes?
number of cases/total population at a specified time
number of new cases/total population at risk per unit of time
Where: ‘at risk’ = ‘free from disease or condition’
Prevalence: fraction of participants with condition at any given point in time
Start of 2010 4/10 000
End of 2010: 5/9 996
Incidence: fraction of participants initially free of condition who develop the condition over a given time
During the 3-year study: 16/9 996 (4 participants had lung cancer at the start of the study)
During 2012: 5/9 985 (15 participants had lung cancer prior to the start of 2012)
number of new cases/number of person years at risk of exposure
Incidence density accounts for different follow-up times
A cohort study seeks to determine incidence density of dementia in an elderly population
The study goes for 2 years, but recruitment occurs over time
50 participants enrol in year 1 and 50 participants enrol in year 2
There are 8 new cases over the two years.
Number of new cases = 8
Number of person years at risk of exposure = (50 x 2) + (50 x 1) = 150 person years
Incidence density = 8/150 person years = 5.3 per 100 person years
\(I_E\): incidence of the disease in the exposed group
\(I_{NE}\): incidence of the disease in the non-exposed group
\(I_E - I_{NE}\)
\(\frac{I_E}{I_{NE}}\)
A study assesses death rate from lung cancer among smokers and non-smokers
Death rate (smokers), \(I_E\) = 341.1 per 100 000 person years
Death rate (non-smokers), \(I_{NE}\) = 14.7 per 100 000 person years
Relative risk = \(I_E/I_{NE} = 341.1/14.7 = 23.2 = 2320\%\)
Attributable risk = \(I_E - I_{NE} = 341.1 - 14.7 = 326.6\) per 100 000 person years
We want to compare an adverse effect from patients taking empagliflozin (new drug) with patients taking metformin (old drug) using registry data.
One of the challenges we will need to contend with is that current users of the new drug are very different to current users of the old drug.
Current users of empagliflozin have started the drug in the past 1–2 years.
In the metformin group, some of these patients might have been taking metformin for 5–10 years (or longer). Patients who didn’t tolerate metformin, or didn’t adequately respond are less likely to be represented within these users compared to the empagliflozin group.
This can introduce a form of survivor bias.
When comparing “treated” and “untreated” patients in a dataset, we need a way to determine who we will consider as “treated”.
For example, an individual might be considered “treated” if they received a prescription for the drug in the previous 12 months.
We want to assess whether proton pump inhibitors increase risk of pneumonia. We have a dataset that includes medicines prescribed and diagnoses and we observe a correlation between proton pump inhibitor use and pneumonia
It might be that proton pump inhibitors increase risk of pneumonia.
Alternatively, it might be that early symptoms of pneumonia are being treated with proton pump inhibitors.
To the extent that the correlation is explained by the treatment of symptoms related to pneumonia, this is an example of confounding by indication
An important method that can reduce bias in cohort studies is the new-user (or incidence-user) design.
In these cohort studies, the participant group is restricted to new users.
To the extent this can be achieved it will reduce survivor bias, immortal time bias and confounding by indication.
Methods to reduce confounding
of participants to treatment
limits the range of characteristics of patients in the study (e.g. active comparator, new user design)
participants in one group with another group so that they share comparable characteristics
compares rates within subgroups (strata) with otherwise similar probability of the outcome
adjusts for differences in a large number of factors related to the outcome using modelling techniques
Does the study answer a focused research question?
Was there a representative and well-defined sample?
Was there an inception cohort?
Was the exposure measured accurately?
Were the outcomes measured accurately? (objective, subjective, masking)
Were important prognostic factors considered?
Were the follow-up of participants sufficiently long and complete?
FDA-NIH Biomarker Working Group. (2016). BEST (Biomarkers, EndpointS, and other Tools) Resource. https://doi.org/10.1164/rccm.201301-0153OC